Search CORE

5 research outputs found

Timing-Error Tolerance Techniques for Low-Power DSP: Filters and Transforms

Author: Whatmough PN
Publication venue: UCL (University College London)
Publication date: 28/09/2012
Field of study

Low-power Digital Signal Processing (DSP) circuits are critical to commercial System-on-Chip design for battery powered devices. Dynamic Voltage Scaling (DVS) of digital circuits can reclaim worst-case supply voltage margins for delay variation, reducing power consumption. However, removing static margins without compromising robustness is tremendously challenging, especially in an era of escalating reliability concerns due to continued process scaling. The Razor DVS scheme addresses these concerns, by ensuring robustness using explicit timing-error detection and correction circuits. Nonetheless, the design of low-complexity and low-power error correction is often challenging. In this thesis, the Razor framework is applied to fixed-precision DSP filters and transforms. The inherent error tolerance of many DSP algorithms is exploited to achieve very low-overhead error correction. Novel error correction schemes for DSP datapaths are proposed, with very low-overhead circuit realisations. Two new approximate error correction approaches are proposed. The first is based on an adapted sum-of-products form that prevents errors in intermediate results reaching the output, while the second approach forces errors to occur only in less significant bits of each result by shaping the critical path distribution. A third approach is described that achieves exact error correction using time borrowing techniques on critical paths. Unlike previously published approaches, all three proposed are suitable for high clock frequency implementations, as demonstrated with fully placed and routed FIR, FFT and DCT implementations in 90nm and 32nm CMOS. Design issues and theoretical modelling are presented for each approach, along with SPICE simulation results demonstrating power savings of 21 – 29%. Finally, the design of a baseband transmitter in 32nm CMOS for the Spectrally Efficient FDM (SEFDM) system is presented. SEFDM systems offer bandwidth savings compared to Orthogonal FDM (OFDM), at the cost of increased complexity and power consumption, which is quantified with the first VLSI architecture

UCL Discovery

A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs

Author: Brooks D
Chai Y
Donato M
Hooper C
Ko GG
Rush AM
Tambe T
Wei GY
Whatmough PN
Yang EY
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/06/2022
Field of study

The proliferation of personal artificial intelligence (AI) -assistant technologies with speech-based conversational AI interfaces is driving the exponential growth in the consumer Internet of Things (IoT) market. As these technologies are being applied to keyword spotting (KWS), automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications, it is of paramount importance that they provide uncompromising performance for context learning in long sequences, which is a key benefit of the attention mechanism, and that they work seamlessly in polyphonic environments. In this work, we present a 25-mm

^2

system-on-chip (SoC) in 16-nm FinFET technology, codenamed SM6, which executes end-to-end speech-enhancing attention-based ASR and NLP workloads. The SoC includes: 1) FlexASR, a highly reconfigurable NLP inference processor optimized for whole-model acceleration of bidirectional attention-based sequence-to-sequence (seq2seq) deep neural networks (DNNs); 2) a Markov random field source separation engine (MSSE), a probabilistic graphical model accelerator for unsupervised inference via Gibbs sampling, used for sound source separation; 3) a dual-core Arm Cortex A53 CPU cluster, which provides on-demand single Instruction/multiple data (SIMD) fast fourier transform (FFT) processing and performs various application logic (e.g., expectation–maximization (EM) algorithm and 8-bit floating-point (FP8) quantization); and 4) an always-on M0 subsystem for audio detection and power management. Measurement results demonstrate the efficiency ranges of 2.6–7.8 TFLOPs/W and 4.33–17.6 Gsamples/s/W for FlexASR and MSSE, respectively; MSSE denoising performance allowing 6

\times

smaller ASR model to be stored on-chip with negligible accuracy loss; and 2.24-mJ energy consumption while achieving real-time throughput, end-to-end, and per-frame ASR latencies of 18 ms

UCL Discovery

Receiver design combining iteration detection and ICI compensation for SEFDM

Author: I Darwazeh
I Kanaras
Min Jia
PN Whatmough
Qing Guo
RC Grammenos
T Xu
T Xu
Xuemai Gu
Zhisheng Yin
Zhiying Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Cloud-based efficient scheme for handwritten digit recognition

Author: A Boukharouba
A Dutt
A Graves
AH Toselli
Allah Ditta
B Soundes
B Walker
Chuangbai Xiao
E Mohebi
J-B Yang
KG Liakos
KS Younis
LF Polania
M Hanmandlu
MA Nielsen
PN Whatmough
Q Xu
Qurat ul Ain Farooq
R Al-Hmouz
S Ali
S Ercoli
Sana Sahiba
Saqib Ali
V Singhal
Y LeCun
Z Shaukat
Zeeshan Shaukat
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A multiplierless pruned DCT-like transformation for image and video compression that requires ten additions only

Author: A Oppenheim
AN Skodras
Arjuna Madanayake
CJ Tablada
DP Skinner
FM Bayer
FM Bayer
Fábio M. Bayer
G Karakonstantis
GAF Seber
GJ Sullivan
I Carugati
J Markel
J Ohm
J Park
JH Kim
KR Rao
KR Rao
L Wang
MT Pourazad
N Kouadria
N Roma
PK Meher
PN Whatmough
R Airoldi
RE Blahut
Renato J. Cintra
RJ Cintra
RJ Cintra
RJ Cintra
S Bouguezel
S Bouguezel
Sunera Kulasekera
T Wiegand
TI Haweel
US Potluri
V Bhaskaran
V Britanak
V Lecuire
Vítor A. Coutinho
WH Chen
Y Huang
Z Wang
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref